Nonparanormal Distributions & Causal Inference with Single-Cell RNA-Seq Data
نویسنده
چکیده
Background. Single-cell RNA-Seq is a new technique that can measure gene expression levels in individual cells. We would like to use single-cell RNA-seq data to learn genetic regulatory networks. This is a natural task for causal-model structurelearning algorithms, which aim to learn the causal relationships between the measured variables. Causal algorithms perform poorly in high dimensions unless the data are Gaussian, and single-cell RNA-Seq data are non-Gaussian. However, the “nonparanormal SKEPTIC” method extends causal algorithms to high-dimensional Gaussian copula distributions, which may better approximate single-cell RNA-Seq data. Aim. To learn a genetic regulatory network by applying the SKEPTIC to real singlecell gene expression data, validating against known regulatory interactions. Data. 24,175 gene expression levels in 934 mouse embryonic stem cells were measured using inDrop single-cell RNA-seq. 500 high-variance genes, including 120 transcription factors, were selected for network recovery. Method. The covariance matrix over the single-cell RNA-Seq data was estimated using the SKEPTIC, and input to causal algorithms, producing a graph over all measured genes. The performance was evaluated on (a) a set of known transcription factor binding relationships from ChIP-Seq studies, and (b) regulatory effects learned from loss-of-function/gain-of-function experiments. Results. Previous studies did no better than chance at identifying adjacencies for eukaryotic organisms. Applying the SKEPTIC to single-cell data and using FGS for structure learning, we identified adjacencies with 22.5% precision, a 14× improvement over chance (p < 10−45). Conclusion. Single-cell RNA-Seq data may be used for automatic, accurate recovery of the genetic regulatory network. These networks help to organize everything from embryonic development to cancer progression. Thus, these methods can be applied in both developmental genetics and personalized cancer medicine.
منابع مشابه
A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملI-42: Origins and Differentiation of Somatic Progenitors of The Mammalian Gonad Revealed by Single Cell RNA-Seq
Background - MaterialsAndMethods N;Results N;Conclusion N;
متن کاملI-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing
Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...
متن کاملApproximate inference of gene regulatory network models from RNA-Seq time series data
Inference of gene regulatory network structures from RNA-Seq data is challenging due to the nature of the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model for RNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regression with a horseshoe prior to learn a dynamic Bayesian network of in...
متن کامل